首页> 外文OA文献 >A software framework for data dimensionality reduction: application to chemical crystallography
【2h】

A software framework for data dimensionality reduction: application to chemical crystallography

机译:数据降维的软件框架:在化学晶体学中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Materials science research has witnessed an increasing use of data mining techniques in establishing process‐structure‐property relationships. Significant advances in high‐throughput experiments and computational capability have resulted in the generation of huge amounts of data. Various statistical methods are currently employed to reduce the noise, redundancy, and the dimensionality of the data to make analysis more tractable. Popular methods for reduction (like principal component analysis) assume a linear relationship between the input and output variables. Recent developments in non‐linear reduction (neural networks, self‐organizing maps), though successful, have computational issues associated with convergence and scalability. Another significant barrier to use dimensionality reduction techniques in materials science is the lack of ease of use owing to their complex mathematical formulations. This paper reviews various spectral‐based techniques that efficiently unravel linear and non‐linear structures in the data which can subsequently be used to tractably investigate process‐structure‐property relationships. In addition, we describe techniques (based on graph‐theoretic analysis) to estimate the optimal dimensionality of the low‐dimensional parametric representation. We show how these techniques can be packaged into a modular, computationally scalable software framework with a graphical user interface ‐ Scalable Extensible Toolkit for Dimensionality Reduction (SETDiR). This interface helps to separate out the mathematics and computational aspects from the materials science applications, thus significantly enhancing utility to the materials science community. The applicability of this framework in constructing reduced order models of complicated materials dataset is illustrated with an example dataset of apatites described in structural descriptor space. Cluster analysis of the low‐dimensional plots yielded interesting insights into the correlation between several structural descriptors like ionic radius and covalence with characteristic properties like apatite stability. This information is crucial as it can promote the use of apatite materials as a potential host system for immobilizing toxic elements.
机译:材料科学研究已经见证了数据挖掘技术在建立过程-结构-属性关系中的越来越多的使用。高通量实验和计算能力的重大进步已导致生成大量数据。当前采用了各种统计方法来减少数据的噪声,冗余和维数,从而使分析更容易处理。流行的减少方法(如主成分分析)假设输入和输出变量之间存在线性关系。非线性归约技术(神经网络,自组织映射)的最新发展尽管成功,但存在与收敛性和可伸缩性相关的计算问题。在材料科学中使用降维技术的另一个重要障碍是由于其复杂的数学公式而缺乏易用性。本文回顾了各种基于频谱的技术,这些技术可有效地解开数据中的线性和非线性结构,随后可用于对过程-结构-属性关系进行精确研究。此外,我们描述了基于图论分析的技术来估计低维参数表示的最佳维数。我们展示了如何将这些技术打包到具有图形用户界面的模块化,可计算扩展的软件框架中-降维可扩展可扩展工具包(SETDiR)。该界面有助于从材料科学应用程序中分离出数学和计算方面,从而大大提高了对材料科学界的实用性。通过在结构描述符空间中描述的磷灰石示例数据集,说明了此框架在构建复杂材料数据集的降阶模型中的适用性。对低维图的聚类分析对一些结构描述符(如离子半径和共价)与磷灰石稳定性等特征特性之间的相关性产生了有趣的见解。该信息至关重要,因为它可以促进磷灰石材料作为固定有毒元素的潜在宿主系统的使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号